base estimator
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- (2 more...)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Oceania > New Zealand > North Island > Waikato (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
PRISM: Enhancing Protein Inverse Folding through Fine-Grained Retrieval on Structure-Sequence Multimodal Representations
Mahbub, Sazan, Kundu, Souvik, Xing, Eric P.
Designing protein sequences that fold into a target three-dimensional structure, known as the inverse folding problem, is central to protein engineering but remains challenging due to the vast sequence space and the importance of local structural constraints. Existing deep learning approaches achieve strong recovery rates, yet they lack explicit mechanisms to reuse fine-grained structure-sequence patterns that are conserved across natural proteins. We present PRISM, a multimodal retrieval-augmented generation framework for inverse folding that retrieves fine-grained representations of potential motifs from known proteins and integrates them with a hybrid self-cross attention decoder. PRISM is formulated as a latent-variable probabilistic model and implemented with an efficient approximation, combining theoretical grounding with practical scalability. Across five benchmarks (CATH-4.2, TS50, TS500, CAMEO 2022, and the PDB date split), PRISM establishes new state of the art in both perplexity and amino acid recovery, while also improving foldability metrics (RMSD, TM-score, pLDDT), demonstrating that fine-grained multimodal retrieval is a powerful and efficient paradigm for protein sequence design.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Oceania > New Zealand > North Island > Waikato (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation
Lalou, Yanis, Gnassounou, Théo, Collas, Antoine, de Mathelin, Antoine, Kachaiev, Oleksii, Odonnat, Ambroise, Gramfort, Alexandre, Moreau, Thomas, Flamary, Rémi
Unsupervised Domain Adaptation (DA) consists of adapting a model trained on a labeled source domain to perform well on an unlabeled target domain with some data distribution shift. While many methods have been proposed in the literature, fair and realistic evaluation remains an open question, particularly due to methodological difficulties in selecting hyperparameters in the unsupervised setting. With SKADA-Bench, we propose a framework to evaluate DA methods and present a fair evaluation of existing shallow algorithms, including reweighting, mapping, and subspace alignment. Realistic hyperparameter selection is performed with nested cross-validation and various unsupervised model selection scores, on both simulated datasets with controlled shifts and real-world datasets across diverse modalities, such as images, text, biomedical, and tabular data with specific feature extraction. Our benchmark highlights the importance of realistic validation and provides practical guidance for real-life applications, with key insights into the choice and impact of model selection approaches. SKADA-Bench is open-source, reproducible, and can be easily extended with novel DA methods, datasets, and model selection criteria without requiring re-evaluating competitors. SKADA-Bench is available on GitHub at https://github.com/scikit-adaptation/skada-bench.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
- (4 more...)
Evaluating the Effectiveness of Index-Based Treatment Allocation
Boehmer, Niclas, Nair, Yash, Shah, Sanket, Janson, Lucas, Taneja, Aparna, Tambe, Milind
When resources are scarce, an allocation policy is needed to decide who receives a resource. This problem occurs, for instance, when allocating scarce medical resources and is often solved using modern ML methods. This paper introduces methods to evaluate index-based allocation policies -- that allocate a fixed number of resources to those who need them the most -- by using data from a randomized control trial. Such policies create dependencies between agents, which render the assumptions behind standard statistical tests invalid and limit the effectiveness of estimators. Addressing these challenges, we translate and extend recent ideas from the statistics literature to present an efficient estimator and methods for computing asymptotically correct confidence intervals. This enables us to effectively draw valid statistical conclusions, a critical gap in previous work. Our extensive experiments validate our methodology in practical settings, while also showcasing its statistical power. We conclude by proposing and empirically verifying extensions of our methodology that enable us to reevaluate a past randomized control trial to evaluate different ML allocation policies in the context of a mHealth program, drawing previously invisible conclusions.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > India (0.04)
- North America > United States > Wisconsin (0.04)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Epidemiology (0.67)
FairGridSearch: A Framework to Compare Fairness-Enhancing Models
Ma, Shih-Chi, Ermakova, Tatiana, Fabian, Benjamin
Machine learning models are increasingly used in critical decision-making applications. However, these models are susceptible to replicating or even amplifying bias present in real-world data. While there are various bias mitigation methods and base estimators in the literature, selecting the optimal model for a specific application remains challenging. This paper focuses on binary classification and proposes FairGridSearch, a novel framework for comparing fairness-enhancing models. FairGridSearch enables experimentation with different model parameter combinations and recommends the best one. The study applies FairGridSearch to three popular datasets (Adult, COMPAS, and German Credit) and analyzes the impacts of metric selection, base estimator choice, and classification threshold on model fairness. The results highlight the significance of selecting appropriate accuracy and fairness metrics for model evaluation. Additionally, different base estimators and classification threshold values affect the effectiveness of bias mitigation methods and fairness stability respectively, but the effects are not consistent across all datasets. Based on these findings, future research on fairness in machine learning should consider a broader range of factors when building fair models, going beyond bias mitigation methods alone.
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Germany > Berlin (0.05)
- North America > United States > Massachusetts (0.04)
- (6 more...)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)